Media Tone Analysis: ABC News Coverage of U.S. Elections

Author

Zixu (Michael) Hao

1 Introduction

This analysis examines ABC News coverage patterns across multiple U.S. election cycles, focusing on tone changes and thematic shifts. Using GDELT’s Global Knowledge Graph data, we analyze:

  1. How media tone fluctuates before and after elections
  2. Which themes dominate coverage during electoral periods
  3. How thematic focus shifts from pre- to post-election periods
  4. Long-term trends in news sentiment across years of political coverage

1.1 Data Overview

The dataset contains ABC News coverage from GDELT’s database, including articles from five election cycles: - 2016 Presidential Election - 2018 Midterm Elections - 2020 Presidential Election - 2022 Midterm Elections - 2024 Presidential Election

2 Data Processing and Preparation

2.1 Data Import and Initial Cleaning

Code
import pandas as pd
import glob
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
from collections import Counter
from scipy.stats import ttest_ind
import matplotlib.dates as mdates

# Set consistent styling for all plots
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['font.family'] = 'sans-serif'
plt.rcParams['font.sans-serif'] = ['Arial', 'DejaVu Sans', 'Liberation Sans']

# Load all fox CSV files
csv_files = glob.glob("../data/abc/abc*.csv")
df = pd.concat([pd.read_csv(file) for file in csv_files], ignore_index=True)

# Select relevant columns
columns_of_interest = [
    "parsed_date", "url", "headline_from_url",
    "V2Themes", "V2Locations", "V2Persons",
    "V2Organizations", "V2Tone"
]
df = df[columns_of_interest]

# Convert parsed_date to datetime and ensure it's timezone-naive
df["parsed_date"] = pd.to_datetime(df["parsed_date"], errors="coerce").dt.tz_localize(None)

# Preview structure and missing values
print("DataFrame structure:")
df.info()
print("\nMissing values count:")
print(df.isnull().sum())
print("\nSample data:")
print(df.sample(5))
DataFrame structure:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 121973 entries, 0 to 121972
Data columns (total 8 columns):
 #   Column             Non-Null Count   Dtype         
---  ------             --------------   -----         
 0   parsed_date        121973 non-null  datetime64[ns]
 1   url                121973 non-null  object        
 2   headline_from_url  121973 non-null  object        
 3   V2Themes           118067 non-null  object        
 4   V2Locations        111745 non-null  object        
 5   V2Persons          104066 non-null  object        
 6   V2Organizations    93430 non-null   object        
 7   V2Tone             121973 non-null  object        
dtypes: datetime64[ns](1), object(7)
memory usage: 7.4+ MB

Missing values count:
parsed_date              0
url                      0
headline_from_url        0
V2Themes              3906
V2Locations          10228
V2Persons            17907
V2Organizations      28543
V2Tone                   0
dtype: int64

Sample data:
               parsed_date                                                url  \
19405  2016-09-14 13:00:00  http://abcnews.go.com/Politics/ivanka-trump-sl...   
62001  2020-04-03 14:45:00  https://abcnews.go.com/GMA/Living/ways-save-mo...   
25466  2017-03-15 00:45:00  http://abcnews.go.com/Entertainment/ben-afflec...   
103999 2023-10-04 03:15:00  https://abcnews.go.com/US/wireStory/final-vote...   
9076   2015-11-03 13:15:00  http://abcnews.go.com/Business/wireStory/itali...   

                                        headline_from_url  \
19405   ivanka trump slams hillary clinton lack action...   
62001                    ways save money amid coronavirus   
25466   ben affleck completed treatment alcohol addiction   
103999  final vote underway oust kevin mccarthy speake...   
9076    italian bank intesa sees quarter profit rise 5...   

                                                 V2Themes  \
19405   TAX_ECON_PRICE,1417;TAX_FNCACT_WOMAN,944;EPU_P...   
62001   MEDIA_SOCIAL,1166;ECON_WORLDCURRENCIES_DOLLARS...   
25466   TAX_FNCACT_ACTOR,151;TAX_FNCACT_MAN,1161;CRISI...   
103999                             TAX_FNCACT_SPEAKER,95;   
9076    TAX_ETHNICITY_ITALIAN,10;TAX_WORLDLANGUAGES_IT...   

                                              V2Locations  \
19405   2#Iowa, United States#US#USIA##42.0046#-93.214...   
62001         1#Americans#US#US##39.828175#-98.5795#US#18   
25466                                                 NaN   
103999  3#Washington, Washington, United States#US#USD...   
9076    1#Italian#IT#IT##42.8333#12.8333#IT#10;1#Italy...   

                                                V2Persons  \
19405   Ivanka Trump,12;Ivanka Trump,302;Ivanka Trump,...   
62001                   Becky Worley,208;Marie Kondo,1028   
25466              David Pollick,1100;Jennifer Garner,526   
103999                                  Kevin Mccarthy,84   
9076                                                  NaN   

                                          V2Organizations  \
19405   Republican National Convention In Cleveland,16...   
62001                      Google,456;News Technology,170   
25466                                                 NaN   
103999                                                NaN   
9076                                                  NaN   

                                                   V2Tone  
19405   0.584795321637427,2.33918128654971,1.754385964...  
62001   -1.3215859030837,1.3215859030837,2.64317180616...  
25466   5.94795539033457,7.80669144981413,1.8587360594...  
103999  -4.54545454545455,4.54545454545455,9.090909090...  
9076    1.47058823529412,2.20588235294118,0.7352941176...  

2.2 Tone Extraction and Processing

GDELT’s V2Tone field contains three comma-separated values: 1. Overall tone score (ranges from -10 to +10) 2. Positive tone component 3. Negative tone component

We extract these components for our analysis:

Code
# Split V2Tone into tone, positive_score, and negative_score
tone_split = df["V2Tone"].str.split(",", expand=True)
df["tone"] = pd.to_numeric(tone_split[0], errors="coerce")
df["positive_score"] = pd.to_numeric(tone_split[1], errors="coerce")
df["negative_score"] = pd.to_numeric(tone_split[2], errors="coerce")

# Descriptive statistics for tone components
tone_stats = pd.DataFrame({
    "Tone": df["tone"].describe(),
    "Positive Score": df["positive_score"].describe(),
    "Negative Score": df["negative_score"].describe()
})

print("Tone metrics descriptive statistics:")
print(tone_stats)

# Create a histogram of tone distribution
plt.figure(figsize=(10, 6))
plt.hist(df["tone"].dropna(), bins=30, alpha=0.7, color='steelblue')
plt.axvline(df["tone"].mean(), color='red', linestyle='dashed', linewidth=1, label=f'Mean: {df["tone"].mean():.2f}')
plt.axvline(0, color='black', linestyle='solid', linewidth=1, label='Neutral Tone')
plt.title("Distribution of ABC News Tone Scores", fontsize=14, fontweight='bold')
plt.xlabel("Tone Score")
plt.ylabel("Frequency")
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
Tone metrics descriptive statistics:
                Tone  Positive Score  Negative Score
count  121973.000000   121973.000000   121973.000000
mean       -3.094822        2.232947        5.327769
std         4.037294        1.601264        3.239981
min       -47.368421        0.000000        0.000000
25%        -5.454545        1.154734        3.000000
50%        -2.849389        2.005731        4.918033
75%        -0.452489        2.991453        7.124682
max        23.809524       23.809524       47.368421

Note: GDELT tone scores typically range from -10 (extremely negative) to +10 (extremely positive), but most news content falls between -5 and +1. ABC News coverage has a mean tone around -2.7, reflecting the generally negative tone common in news media.

2.3 Define Key Election Dates

Code
# Define key U.S. elections and COVID emergence
election_events = {
    "2016 Presidential": "2016-11-08",
    "2018 Midterms": "2018-11-06",
    "2020 Presidential": "2020-11-03",
    "2022 Midterms": "2022-11-08",
    "2024 Presidential": "2024-11-05",
    "COVID": "2020-03-10"
}
event_dates = {label: pd.to_datetime(date) for label, date in election_events.items()}

# Create a dictionary without COVID for analyses that only need election dates
election_dates = {k: v for k, v in event_dates.items() if k != "COVID"}

2.4 Theme Name Mapping

GDELT uses technical theme codes that we convert to more readable names:

Code
# Theme name mapping for readability
theme_name_mapping = {
    "LEADER": "Leaders",
    "TAX_FNCACT_PRESIDENT": "Presidents",
    "USPEC_POLITICS_GENERAL1": "General Politics",
    "IMMIGRATION": "Immigration",
    "WB_2769_JOBS_STRATEGIES": "Job Strategies",
    "WB_2837_IMMIGRATION": "Immigration (WB)",
    "WB_2836_MIGRATION_POLICIES_AND_JOBS": "Migration Policies",
    "WB_2670_JOBS": "Jobs",
    "EPU_CATS_MIGRATION_FEAR_MIGRATION": "Migration Fear",
    "GENERAL_GOVERNMENT": "Government",
    "BORDER": "Border",
    "CRISISLEX_CRISISLEXREC": "Crisis Reporting",
    "NATURAL_DISASTER_HURRICANE": "Hurricanes",
    "TAX_WORLDMAMMALS_ABC": "ABC News (Self)",
    "EPU_POLICY_GOVERNMENT": "Government Policy",
    "TAX_FNCACT_POLICE": "Police",
    "UNGP_CRIME_VIOLENCE": "Crime & Violence",
    "HEALTH_VACCINATION": "Vaccination",
    "WB_639_REPRODUCTIVE_MATERNAL_AND_CHILD_HEALTH": "Reproductive & Child Health",
    "WB_642_CHILD_HEALTH": "Child Health",
    "WB_1459_IMMUNIZATIONS": "Immunizations",
    "UNGP_HEALTHCARE": "Healthcare (UNGP)",
    "TAX_FNCACT_NOMINEE": "Nominees",
    "MEDIA_SOCIAL": "Social Media",
    "ELECTION": "Election",
    "ECON_INFLATION": "Inflation",
    "WB_1104_MACROECONOMIC_VULNERABILITY_AND_DEBT": "Macro Vulnerability & Debt",
    "WB_442_INFLATION": "Inflation (WB)",
    "TAX_POLITICAL_PARTY_DEMOCRATS": "Democrats",
    "TAX_FNCACT_QUEEN": "Queen",
    "TAX_FNCACT_VICE_PRESIDENT": "Vice Presidents",
    "CRISISLEX_C07_SAFETY": "Safety",
    "MANMADE_DISASTER_IMPLIED": "Manmade Disaster",
    "WB_2432_FRAGILITY_CONFLICT_AND_VIOLENCE": "Conflict & Fragility"
}

3 Tone Analysis

3.2 Tone Patterns Around Election Events

Code
# Plot with election and COVID overlays
plt.figure(figsize=(14, 6))
plt.plot(tone_trend["year_month"], tone_trend["tone"], alpha=0.3, label='Monthly Average')
plt.plot(tone_trend["year_month"], tone_trend["rolling_avg"], color='red', label='3-Month Rolling Avg', linewidth=2)

# Draw event lines with improved styling
for label, date in event_dates.items():
    color = 'blue' if 'COVID' not in label else 'darkgreen'
    plt.axvline(date, color=color, linestyle='--', alpha=0.7)
    y_pos = tone_trend["tone"].min() + 0.3 if 'COVID' not in label else tone_trend["tone"].min() + 0.6
    plt.text(date, y_pos, label, rotation=90, verticalalignment='bottom', fontsize=10, fontweight='bold')

plt.title("Media Tone With Key Events Highlighted", fontsize=16, fontweight='bold')
plt.xlabel("Year", fontsize=12)
plt.ylabel("Average Tone Score", fontsize=12)
plt.grid(True, alpha=0.3)
plt.legend(loc='upper right')
plt.tight_layout()
plt.show()

Key observations from the timeline:

  • Election Effects: Each election appears to correspond with shifts in media tone
  • COVID Impact: The pandemic’s onset coincides with a notable drop in tone, suggesting increased negative coverage
  • Presidential vs. Midterms: Presidential elections (2016, 2020, 2024) show more pronounced tone fluctuations than midterms (2018, 2022)

3.3 Election Tone Shift Analysis

Code
# Analyze tone before vs. after each election
results = []
for label, date in election_dates.items():
    pre = df[(df["parsed_date"] >= date - pd.DateOffset(months=3)) & (df["parsed_date"] < date)]
    post = df[(df["parsed_date"] >= date) & (df["parsed_date"] < date + pd.DateOffset(months=3))]

    results.append({
        "election": label,
        "pre_avg_tone": pre["tone"].mean(),
        "post_avg_tone": post["tone"].mean(),
        "tone_shift": post["tone"].mean() - pre["tone"].mean(),
        "pre_articles": len(pre),
        "post_articles": len(post)
    })

# Create results DataFrame
tone_shift_df = pd.DataFrame(results)
print("Tone shifts before and after elections:")
print(tone_shift_df)

# Setup for bar plot
labels = tone_shift_df["election"]
x = np.arange(len(labels))
width = 0.35

plt.figure(figsize=(12, 7))
bars1 = plt.bar(x - width/2, tone_shift_df["pre_avg_tone"], width, label='3 Months Before', color='#3274A1')
bars2 = plt.bar(x + width/2, tone_shift_df["post_avg_tone"], width, label='3 Months After', color='#E1812C')

plt.ylabel("Average Tone Score", fontsize=12)
plt.title("News Tone Before vs. After U.S. Elections", fontsize=16, fontweight='bold')
plt.xticks(x, labels, rotation=45, ha="right", fontsize=11)
plt.axhline(0, color='black', linewidth=0.5)
plt.legend(fontsize=11)
plt.grid(axis='y', linestyle='--', alpha=0.5)

# Annotate tone shift on top with improved formatting
for i in range(len(x)):
    shift = tone_shift_df["tone_shift"].iloc[i]
    y_pos = max(tone_shift_df["pre_avg_tone"].iloc[i], tone_shift_df["post_avg_tone"].iloc[i]) + 0.15
    plt.text(x[i], y_pos,
             f"+{shift:.2f}" if shift > 0 else f"{shift:.2f}", 
             ha='center', fontsize=11, fontweight='bold',
             color='green' if shift > 0 else 'red')

# Add article count annotation
for i, bars in enumerate([(bars1, tone_shift_df["pre_articles"]), (bars2, tone_shift_df["post_articles"])]):
    bar_collection, counts = bars
    for j, bar in enumerate(bar_collection):
        plt.text(bar.get_x() + bar.get_width()/2, -3.1,
                 f"n={counts.iloc[j]:,}", ha='center', va='bottom',
                 fontsize=8, rotation=90, color='dimgrey')

plt.ylim(bottom=-3)
plt.tight_layout()
plt.show()
Tone shifts before and after elections:
            election  pre_avg_tone  post_avg_tone  tone_shift  pre_articles  \
0  2016 Presidential     -3.300117      -2.996192    0.303925          3004   
1      2018 Midterms     -3.190385      -2.877029    0.313356          2979   
2  2020 Presidential     -3.270054      -2.787912    0.482143          3015   
3      2022 Midterms     -3.280437      -3.213493    0.066944          2999   
4  2024 Presidential     -2.997438      -2.582580    0.414858          2992   

   post_articles  
0           2982  
1           3475  
2           3041  
3           2987  
4           3011  

Key Finding: All five elections showed a positive tone shift in the three months following the election compared to the three months before. This suggests a consistent pattern where post-election coverage tends to be less negative than pre-election coverage.

3.3.1 Statistical Significance Testing

Code
# Perform t-tests for statistical significance
significance_results = []
for label, date in election_dates.items():
    pre = df[(df["parsed_date"] >= date - pd.DateOffset(months=3)) & (df["parsed_date"] < date)]["tone"].dropna()
    post = df[(df["parsed_date"] >= date) & (df["parsed_date"] < date + pd.DateOffset(months=3))]["tone"].dropna()
    
    t_stat, p_val = ttest_ind(post, pre, equal_var=False)
    significance_results.append({
        "Election": label,
        "t-statistic": round(t_stat, 4),
        "p-value": round(p_val, 4),
        "Significant": "Yes" if p_val < 0.05 else "No"
    })

# Convert to DataFrame for cleaner display
sig_df = pd.DataFrame(significance_results)
print("Statistical significance of tone shifts (t-test):")
print(sig_df)
Statistical significance of tone shifts (t-test):
            Election  t-statistic  p-value Significant
0  2016 Presidential       2.8542   0.0043         Yes
1      2018 Midterms       3.1605   0.0016         Yes
2  2020 Presidential       5.0186   0.0000         Yes
3      2022 Midterms       0.6355   0.5251          No
4  2024 Presidential       3.9417   0.0001         Yes

Interpretation: A p-value < 0.05 indicates the tone shift is statistically significant (not due to random chance). The t-statistic magnitude shows the strength of the difference, with higher absolute values indicating stronger effects.

4 Theme Analysis

4.1 Overall Theme Distribution

Code
# Drop missing themes and split by semicolon
themes_series = df["V2Themes"].dropna().str.split(";")

# Flatten the list of all theme entries
all_themes = [theme.split(",")[0] for sublist in themes_series for theme in sublist if theme]

# Count the most frequent themes
theme_counts = Counter(all_themes).most_common(20)

# Map to friendly names
friendly_counts = [(theme_name_mapping.get(theme, theme), count) for theme, count in theme_counts]

# Create a visually appealing bar chart
theme_df = pd.DataFrame(friendly_counts, columns=['Theme', 'Count'])
theme_df = theme_df.sort_values('Count', ascending=False)

plt.figure(figsize=(12, 8))
bars = plt.barh(theme_df['Theme'], theme_df['Count'], color=plt.cm.viridis(np.linspace(0, 0.8, len(theme_df))))

# Add count labels
for bar in bars:
    width = bar.get_width()
    plt.text(width + (width * 0.01), bar.get_y() + bar.get_height()/2, 
            f'{width:,.0f}', ha='left', va='center', fontsize=10, 
            fontweight='bold', color='dimgrey')

plt.title("Top 20 Themes in ABC News Coverage", fontsize=16, fontweight='bold')
plt.xlabel('Frequency', fontsize=12)
plt.grid(axis='x', linestyle='--', alpha=0.7, color='lightgrey')
plt.gca().spines['right'].set_visible(False)
plt.gca().spines['top'].set_visible(False)
plt.tight_layout()
plt.show()

The visualization shows ABC News’ dominant themes across the full dataset period:

  • Political Focus: Presidential coverage, leadership, and general politics dominate
  • Immigration: A consistently significant theme in ABC News coverage
  • Other Notable Themes: Government operations, crisis reporting, and economic issues

4.2 Pre-Election Theme Analysis

Code
# Create visualization for themes 3 months before each election
# Define a professional color palette
palette = plt.cm.viridis(np.linspace(0, 0.9, 10))

# Create subplot grid with adjusted layout
fig, axes = plt.subplots(len(election_dates), 1, figsize=(14, 5*len(election_dates)))
fig.subplots_adjust(hspace=0.5)

# Handle single-election case
if len(election_dates) == 1:
    axes = [axes]

# For each election, get the most common themes in the 3 months before
for i, (election, date) in enumerate(election_dates.items()):
    pre_start = date - pd.DateOffset(months=3)
    pre_end = date - pd.DateOffset(days=1)
    
    # Get themes for this time period
    election_window = (df["parsed_date"] >= pre_start) & (df["parsed_date"] <= pre_end)
    pre_election_themes = df.loc[election_window, "V2Themes"].dropna().str.split(";")
    
    # Extract and count themes
    theme_counts = [theme.split(",")[0] for sublist in pre_election_themes for theme in sublist if theme]
    top_themes = Counter(theme_counts).most_common(10)
    
    # Map to friendly names
    friendly_themes = [(theme_name_mapping.get(theme, theme), count) for theme, count in top_themes]
    
    # Create DataFrame for this election
    theme_df = pd.DataFrame(friendly_themes, columns=['Theme', 'Count'])
    theme_df = theme_df.sort_values('Count')
    
    # Plot horizontal bar chart
    ax = axes[i]
    bars = ax.barh(theme_df['Theme'], theme_df['Count'], color=palette, height=0.7)
    
    # Add count labels
    for bar in bars:
        width = bar.get_width()
        ax.text(width + (width * 0.01), bar.get_y() + bar.get_height()/2, 
                f'{width:,.0f}', ha='left', va='center', fontsize=10, 
                fontweight='bold', color='dimgrey')
    
    # Set titles and labels
    ax.set_title(f"Top Media Themes: 3 Months Before {election}", 
                fontsize=16, fontweight='bold', pad=20)
    ax.set_xlabel('Frequency', fontsize=12)
    ax.set_ylabel('')
    ax.invert_yaxis()
    
    # Improve styling
    ax.grid(axis='x', linestyle='--', alpha=0.7, color='lightgrey')
    ax.spines['right'].set_visible(False)
    ax.spines['top'].set_visible(False)
    
    # Annotate the date range
    date_range = f"({pre_start.strftime('%b %d, %Y')} - {pre_end.strftime('%b %d, %Y')})"
    ax.text(0.5, 1.05, date_range, transform=ax.transAxes, 
            ha='center', fontsize=12, fontstyle='italic', color='grey')

plt.suptitle("Pre-Election Media Focus: ABC News Themes Before Each Election", 
             fontsize=20, y=1.02, fontweight='bold')

plt.tight_layout()
plt.show()

Key Observations: - Presidential themes dominate coverage in presidential election years - Immigration appears consistently across multiple election cycles - Some themes are election-specific (e.g., the prominence of healthcare in certain cycles)

4.3 Post-Election Theme Analysis (3 Months)

Code
# Create visualization for themes 3 months after each election
fig, axes = plt.subplots(len(election_dates), 1, figsize=(14, 5*len(election_dates)))
fig.subplots_adjust(hspace=0.5)

# Handle single-election case
if len(election_dates) == 1:
    axes = [axes]

# For each election, get the most common themes in the 3 months after
for i, (election, date) in enumerate(election_dates.items()):
    post_start = date + pd.DateOffset(days=1)
    post_end = date + pd.DateOffset(months=3)
    
    # Get themes for this time period
    election_window = (df["parsed_date"] >= post_start) & (df["parsed_date"] <= post_end)
    post_election_themes = df.loc[election_window, "V2Themes"].dropna().str.split(";")
    
    # Extract and count themes
    theme_counts = [theme.split(",")[0] for sublist in post_election_themes for theme in sublist if theme]
    top_themes = Counter(theme_counts).most_common(10)
    
    # Map to friendly names
    friendly_themes = [(theme_name_mapping.get(theme, theme), count) for theme, count in top_themes]
    
    # Create DataFrame for this election
    theme_df = pd.DataFrame(friendly_themes, columns=['Theme', 'Count'])
    theme_df = theme_df.sort_values('Count')
    
    # Plot horizontal bar chart
    ax = axes[i]
    bars = ax.barh(theme_df['Theme'], theme_df['Count'], color=palette, height=0.7)
    
    # Add count labels
    for bar in bars:
        width = bar.get_width()
        ax.text(width + (width * 0.01), bar.get_y() + bar.get_height()/2, 
                f'{width:,.0f}', ha='left', va='center', fontsize=10, 
                fontweight='bold', color='dimgrey')
    
    # Set titles and labels
    ax.set_title(f"Top Media Themes: 3 Months After {election}", 
                fontsize=16, fontweight='bold', pad=20)
    ax.set_xlabel('Frequency', fontsize=12)
    ax.set_ylabel('')
    ax.invert_yaxis()
    
    # Improve styling
    ax.grid(axis='x', linestyle='--', alpha=0.7, color='lightgrey')
    ax.spines['right'].set_visible(False)
    ax.spines['top'].set_visible(False)
    
    # Annotate the date range
    date_range = f"({post_start.strftime('%b %d, %Y')} - {post_end.strftime('%b %d, %Y')})"
    ax.text(0.5, 1.05, date_range, transform=ax.transAxes, 
            ha='center', fontsize=12, fontstyle='italic', color='grey')

plt.suptitle("Post-Election Media Focus: ABC News Themes After Each Election", 
             fontsize=20, y=1.02, fontweight='bold')

plt.tight_layout()
plt.show()

Post-Election Media Focus: - The President/Presidential themes often remain dominant immediately after elections - Government administration themes become more prominent in the post-election period - Some campaign-related themes decrease in prominence

4.4 Extended Post-Election Coverage (6 Months)

Code
# Create visualization for themes 6 months after each election
fig, axes = plt.subplots(len(election_dates), 1, figsize=(14, 5*len(election_dates)))
fig.subplots_adjust(hspace=0.5)

# Handle single-election case
if len(election_dates) == 1:
    axes = [axes]

# For each election, get the most common themes in the 6 months after
for i, (election, date) in enumerate(election_dates.items()):
    post_start = date + pd.DateOffset(days=1)
    post_end = date + pd.DateOffset(months=6)
    
    # Get themes for this time period
    election_window = (df["parsed_date"] >= post_start) & (df["parsed_date"] <= post_end)
    post_election_themes = df.loc[election_window, "V2Themes"].dropna().str.split(";")
    
    # Extract and count themes
    theme_counts = [theme.split(",")[0] for sublist in post_election_themes for theme in sublist if theme]
    top_themes = Counter(theme_counts).most_common(10)
    
    # Map to friendly names
    friendly_themes = [(theme_name_mapping.get(theme, theme), count) for theme, count in top_themes]
    
    # Create DataFrame for this election
    theme_df = pd.DataFrame(friendly_themes, columns=['Theme', 'Count'])
    theme_df = theme_df.sort_values('Count')
    
    # Plot horizontal bar chart
    ax = axes[i]
    bars = ax.barh(theme_df['Theme'], theme_df['Count'], color=palette, height=0.7)
    
    # Add count labels
    for bar in bars:
        width = bar.get_width()
        ax.text(width + (width * 0.01), bar.get_y() + bar.get_height()/2, 
                f'{width:,.0f}', ha='left', va='center', fontsize=10, 
                fontweight='bold', color='dimgrey')
    
    # Set titles and labels
    ax.set_title(f"Top Media Themes: 6 Months After {election}", 
                fontsize=16, fontweight='bold', pad=20)
    ax.set_xlabel('Frequency', fontsize=12)
    ax.set_ylabel('')
    ax.invert_yaxis()
    
    # Improve styling
    ax.grid(axis='x', linestyle='--', alpha=0.7, color='lightgrey')
    ax.spines['right'].set_visible(False)
    ax.spines['top'].set_visible(False)
    
    # Annotate the date range
    date_range = f"({post_start.strftime('%b %d, %Y')} - {post_end.strftime('%b %d, %Y')})"
    ax.text(0.5, 1.05, date_range, transform=ax.transAxes, 
            ha='center', fontsize=12, fontstyle='italic', color='grey')

plt.suptitle("Extended Post-Election Coverage: 6-Month ABC News Themes", 
             fontsize=20, y=1.02, fontweight='bold')

plt.tight_layout()
plt.show()

Extended Coverage Patterns: - Over a 6-month post-election period, coverage shows a broader range of themes - Governance and policy themes become more prominent compared to immediate post-election coverage - Emerging issues often rise in prominence, diluting election-specific themes

4.5 Theme Shifts Before vs. After Elections

Code
# Function to get theme counts in a specific date range
def get_theme_counts(start_date, end_date):
    mask = (df["parsed_date"] >= start_date) & (df["parsed_date"] <= end_date)
    themes_series = df.loc[mask, "V2Themes"].dropna().str.split(";")
    all_themes = [theme.split(",")[0] for sublist in themes_series for theme in sublist if theme]
    return Counter(all_themes)

# Analyze themes before and after each election
theme_shift_analysis = {}
theme_shift_data = []  # Create a list to store data for the DataFrame

for election, date in election_dates.items():
    pre_start = date - pd.DateOffset(months=3)
    pre_end = date - pd.DateOffset(days=1)
    post_start = date + pd.DateOffset(days=1)
    post_end = date + pd.DateOffset(months=3)

    pre_counts = get_theme_counts(pre_start, pre_end)
    post_counts = get_theme_counts(post_start, post_end)

    # Calculate the difference in theme frequencies
    theme_diff = {theme: post_counts[theme] - pre_counts.get(theme, 0) for theme in post_counts}

    # Sort themes by the magnitude of change
    sorted_theme_diff = sorted(theme_diff.items(), key=lambda item: abs(item[1]), reverse=True)
    
    # Store top 10 themes with the most change
    theme_shift_analysis[election] = sorted_theme_diff[:10]
    
    # Add to the data list for DataFrame
    for theme, shift in sorted_theme_diff[:10]:
        theme_shift_data.append({
            "Election": election,
            "Theme": theme,
            "Tone Shift": shift
        })

# Create theme_df from the collected data
theme_df = pd.DataFrame(theme_shift_data)

# Apply theme name mapping
theme_df["Theme"] = theme_df["Theme"].map(lambda x: theme_name_mapping.get(x, x))

4.5.1 Direct Theme Comparison Visualizations

Code
# Create a visualization comparing top themes before and after each election
for election, date in election_dates.items():
    # Define time periods
    pre_start = date - pd.DateOffset(months=3)
    pre_end = date - pd.DateOffset(days=1)
    post_start = date + pd.DateOffset(days=1)
    post_end = date + pd.DateOffset(months=3)
    
    # Get pre-election themes
    pre_window = (df["parsed_date"] >= pre_start) & (df["parsed_date"] <= pre_end)
    pre_themes = df.loc[pre_window, "V2Themes"].dropna().str.split(";")
    pre_counts = [theme.split(",")[0] for sublist in pre_themes for theme in sublist if theme]
    pre_top = dict(Counter(pre_counts).most_common(15))
    
    # Get post-election themes
    post_window = (df["parsed_date"] >= post_start) & (df["parsed_date"] <= post_end)
    post_themes = df.loc[post_window, "V2Themes"].dropna().str.split(";")
    post_counts = [theme.split(",")[0] for sublist in post_themes for theme in sublist if theme]
    post_top = dict(Counter(post_counts).most_common(15))
    
    # Get all unique themes
    all_themes = set(pre_top.keys()) | set(post_top.keys())
    
    # Create dataframe with both periods
    comparison_data = []
    for theme in all_themes:
        friendly_name = theme_name_mapping.get(theme, theme)
        comparison_data.append({
            'Theme': friendly_name,
            'Pre-Election': pre_top.get(theme, 0),
            'Post-Election': post_top.get(theme, 0),
            'Difference': post_top.get(theme, 0) - pre_top.get(theme, 0)
        })
    
    # Create DataFrame and sort by absolute difference
    comp_df = pd.DataFrame(comparison_data)
    comp_df = comp_df.sort_values('Difference', key=abs, ascending=False).head(12)
    
    # Calculate percentages for better comparison
    total_pre = sum(pre_top.values())
    total_post = sum(post_top.values())
    comp_df['Pre %'] = comp_df['Pre-Election'] / total_pre * 100
    comp_df['Post %'] = comp_df['Post-Election'] / total_post * 100
    comp_df['% Change'] = comp_df['Post %'] - comp_df['Pre %']
    
    # Create figure with multiple subplots
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(18, 10), gridspec_kw={'width_ratios': [3, 1]})
    
    # Plot 1: Side-by-side bar chart of counts
    comp_df = comp_df.sort_values('Theme')  # Sort alphabetically for this chart
    x = np.arange(len(comp_df))
    width = 0.35
    
    # Plot bars
    pre_bars = ax1.barh(x - width/2, comp_df['Pre-Election'], width, 
                      label='Pre-Election', color='#3274A1', alpha=0.8)
    post_bars = ax1.barh(x + width/2, comp_df['Post-Election'], width,
                       label='Post-Election', color='#E1812C', alpha=0.8)
    
    # Add labels and styling
    ax1.set_yticks(x)
    ax1.set_yticklabels(comp_df['Theme'])
    ax1.invert_yaxis()
    ax1.legend(loc='upper right')
    ax1.set_title(f'Theme Frequency Comparison for {election}', fontsize=16, fontweight='bold')
    ax1.set_xlabel('Count', fontsize=12)
    
    # Add count labels
    for bars, counts in [(pre_bars, comp_df['Pre-Election']), 
                         (post_bars, comp_df['Post-Election'])]:
        for bar, count in zip(bars, counts):
            if count > 0:
                ax1.text(count + 50, bar.get_y() + bar.get_height()/2, 
                       f'{count:,.0f}', ha='left', va='center', fontsize=9)
    
    # Plot 2: Net change (waterfall chart alternative)
    comp_df = comp_df.sort_values('Difference')  # Sort by difference for this chart
    colors = ['#E15759' if x < 0 else '#4E79A7' for x in comp_df['Difference']]
    
    # Plot the differences
    diff_bars = ax2.barh(comp_df['Theme'], comp_df['Difference'], color=colors)
    
    # Add a vertical line at zero
    ax2.axvline(x=0, color='black', linestyle='-', alpha=0.3)
    
    # Add labels
    for bar in diff_bars:
        width = bar.get_width()
        label_x_pos = width + np.sign(width) * 50
        if width > 0:
            ha = 'left'
        else:
            ha = 'right'
        ax2.text(label_x_pos, bar.get_y() + bar.get_height()/2, 
                f'{width:+,.0f}', ha=ha, va='center', fontsize=9)
    
    ax2.set_title('Net Change in Theme Frequency', fontsize=16, fontweight='bold')
    ax2.set_xlabel('Difference (Post - Pre)', fontsize=12)
    ax2.set_yticklabels([])  # Hide y-labels as they're in the first plot
    
    # Add overall title and subtitles
    plt.suptitle(f'Media Focus Shift: Before vs. After {election}', fontsize=20, fontweight='bold', y=0.98)
    pre_range = f"Pre: {pre_start.strftime('%b %d, %Y')} - {pre_end.strftime('%b %d, %Y')}"
    post_range = f"Post: {post_start.strftime('%b %d, %Y')} - {post_end.strftime('%b %d, %Y')}"
    fig.text(0.5, 0.91, f"{pre_range} | {post_range}", ha='center', fontsize=12, fontstyle='italic')
    
    # Add explanatory notes
    fig.text(0.5, 0.03, 
            "Note: Blue bars in the right panel indicate themes that gained prominence after the election, while red bars show declining themes.",
            ha='center', fontsize=10, fontstyle='italic')
    
    plt.tight_layout()
    plt.subplots_adjust(top=0.88)
    plt.show()

Key Insights: - Each election shows distinctive shifts in thematic focus - Some themes consistently gain prominence after elections (e.g., Presidential coverage) - Campaign-specific themes often decline after elections - The right panel clearly indicates which themes gain (blue) or lose (red) prominence

4.6 Theme-Specific Tone Analysis

Code
# Create a heatmap visualization of theme tone shifts across elections
# Create pivot table for heatmap
pivot_df = theme_df.pivot(index="Theme", columns="Election", values="Tone Shift").fillna(0)

# Overall heatmap
plt.figure(figsize=(12, 10))
sns.heatmap(pivot_df, cmap="RdBu_r", center=0, annot=True, fmt=".0f", linewidths=0.5)
plt.title("Theme Frequency Shifts Across Elections", fontsize=16, fontweight='bold')
plt.ylabel("Theme", fontsize=12)
plt.xlabel("Election", fontsize=12)
plt.tight_layout()
plt.show()

# Individual election heatmaps for clearer detail
unique_elections = theme_df["Election"].unique()

for election in unique_elections:
    # Filter for this election and create a pivot table
    election_df = theme_df[theme_df["Election"] == election]
    single_df = election_df.pivot(index="Theme", columns="Election", values="Tone Shift").fillna(0)
    
    plt.figure(figsize=(8, 10))
    sns.heatmap(single_df, cmap="RdBu_r", center=0, annot=True, fmt=".0f", linewidths=0.5)
    plt.title(f"Theme Frequency Shifts – {election}", fontsize=16, fontweight='bold')
    plt.xlabel("Election")
    plt.ylabel("Theme")
    plt.tight_layout()
    plt.show()

Understanding the Heatmap:

  • Rows (Y-axis): Each theme extracted from ABC News coverage
  • Columns (X-axis): Different election cycles
  • Colors:
    • Red = Increased theme frequency after the election
    • Blue = Decreased theme frequency after the election
    • White = No significant change
  • Numbers: The raw count difference between post-election and pre-election periods

4.7 Theme Evolution Timeline

Code
# Create a timeline visualization showing how key themes evolved across all elections

# Select important themes to track over time
key_themes = ['Immigration', 'General Politics']
theme_codes = {v: k for k, v in theme_name_mapping.items() if v in key_themes}
theme_codes.update({k: k for k in key_themes if k not in theme_name_mapping.values()})

# Get monthly data for these themes
monthly_data = []

# Convert min and max years to integers explicitly
min_year = int(df['parsed_date'].dt.year.min())
max_year = int(df['parsed_date'].dt.year.max() + 1)

# Create timeline with monthly data points
for year in range(min_year, max_year):
    for month in range(1, 13):
        start_date = pd.Timestamp(f"{year}-{month:02d}-01")
        if month == 12:
            end_date = pd.Timestamp(f"{year+1}-01-01") - pd.Timedelta(days=1)
        else:
            end_date = pd.Timestamp(f"{year}-{month+1:02d}-01") - pd.Timedelta(days=1)
        
        # Skip dates outside our dataset
        if start_date < df['parsed_date'].min() or start_date > df['parsed_date'].max():
            continue
        
        # Get themes for this month
        mask = (df["parsed_date"] >= start_date) & (df["parsed_date"] <= end_date)
        if df.loc[mask].shape[0] == 0:  # Skip months with no data
            continue
            
        month_themes = df.loc[mask, "V2Themes"].dropna().str.split(";")
        all_month_themes = [theme.split(",")[0] for sublist in month_themes for theme in sublist if theme]
        theme_counter = Counter(all_month_themes)
        
        # Get counts for our key themes
        for display_name, code in theme_codes.items():
            monthly_data.append({
                'date': start_date,
                'theme': display_name,
                'count': theme_counter.get(code, 0)
            })

# Convert to DataFrame
timeline_df = pd.DataFrame(monthly_data)

# Normalize by total monthly theme counts to get percentage
monthly_totals = timeline_df.groupby('date')['count'].sum().reset_index()
monthly_totals.columns = ['date', 'total']
timeline_df = timeline_df.merge(monthly_totals, on='date')
timeline_df['percentage'] = (timeline_df['count'] / timeline_df['total'] * 100).round(2)

# Plot the theme timeline
plt.figure(figsize=(20, 10))

# Get unique themes and assign colors
unique_themes = timeline_df['theme'].unique()
colors = plt.cm.Dark2(np.linspace(0, 1, len(unique_themes)))
theme_colors = dict(zip(unique_themes, colors))

# Create separate trend line for each theme
for theme in unique_themes:
    theme_data = timeline_df[timeline_df['theme'] == theme]
    plt.plot(theme_data['date'], theme_data['percentage'], 
             label=theme, linewidth=2.5, color=theme_colors[theme],
             marker='o', markersize=3)

# First create the plot so the y-axis limits are established
plt.xlabel('Date', fontsize=14)
plt.ylabel('Percentage of Monthly Coverage', fontsize=14)
plt.title('Evolution of Key Media Themes Over Time', fontsize=20, fontweight='bold')
plt.grid(True, alpha=0.3)

# Get y-axis limits *after* the plot is created
y_lim = plt.gca().get_ylim()

# Add election markers with fixed y-position
for election, date in election_dates.items():
    plt.axvline(x=date, color='black', linestyle='--', alpha=0.5)
    # Calculate y position based on current y-axis limits
    y_pos = y_lim[1] * 0.95
    plt.text(date, y_pos, election, rotation=90, ha='right', fontsize=10)

plt.legend(loc='upper center', bbox_to_anchor=(0.5, -0.05), ncol=len(unique_themes), fontsize=12, frameon=True)

# Format x-axis date labels
plt.gcf().autofmt_xdate()
plt.tight_layout()
plt.show()

Longitudinal Theme Analysis:

This visualization tracks key themes as a percentage of total coverage over time, revealing:

  • How media focus evolves before, during, and after election periods
  • Seasonal patterns in thematic coverage
  • Long-term trends in media priorities
  • The relationship between certain themes and specific elections

5 Comparative Analysis of Tone Across Media

Code
import pandas as pd
import glob

def load_media_data(media_name):
    csv_files = glob.glob(f"../data/{media_name}/*.csv")
    df = pd.concat([pd.read_csv(file) for file in csv_files], ignore_index=True)
    df['media'] = media_name
    return df

abc_df = load_media_data('abc')
msnbc_df = load_media_data('msnbc')
fox_df = load_media_data('fox')

combined_df = pd.concat([abc_df, msnbc_df, fox_df], ignore_index=True)

tone_split = combined_df["V2Tone"].str.split(",", expand=True)
combined_df["tone"] = pd.to_numeric(tone_split[0], errors="coerce")
combined_df["parsed_date"] = pd.to_datetime(combined_df["parsed_date"], errors="coerce").dt.tz_localize(None)

5.1 Comparative visualization of cross-media tone

Code
import matplotlib.pyplot as plt
import seaborn as sns

plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['font.family'] = 'sans-serif'

plt.figure(figsize=(12, 6))
sns.boxplot(x='media', y='tone', data=combined_df, palette=['#1f77b4', '#ff7f0e', '#2ca02c'])
plt.title('Distribution of Tone Scores Across News Networks', fontsize=16, fontweight='bold')
plt.xlabel('News Network', fontsize=12)
plt.ylabel('Tone Score', fontsize=12)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

combined_df['year_month'] = combined_df['parsed_date'].dt.to_period('M')
media_tone_trend = combined_df.groupby(['year_month', 'media'])['tone'].mean().reset_index()
media_tone_trend['year_month'] = pd.to_datetime(media_tone_trend['year_month'].astype(str))

plt.figure(figsize=(16, 8))
for media, color in zip(['abc', 'msnbc', 'fox'], ['#1f77b4', '#ff7f0e', '#2ca02c']):
    media_data = media_tone_trend[media_tone_trend['media'] == media]
    plt.plot(media_data['year_month'], media_data['tone'], 
             label=media.upper(), color=color, linewidth=2)

    media_data['rolling'] = media_data['tone'].rolling(window=3, center=True).mean()
    plt.plot(media_data['year_month'], media_data['rolling'], 
             color=color, linestyle='--', alpha=0.7)

for label, date in event_dates.items():
    plt.axvline(date, color='gray', linestyle='--', alpha=0.5)
    plt.text(date, media_tone_trend['tone'].min()+0.5, label, 
             rotation=90, va='bottom', fontsize=10)

plt.title('Tone Trend Comparison Across News Networks', fontsize=16, fontweight='bold')
plt.xlabel('Year', fontsize=12)
plt.ylabel('Average Tone Score', fontsize=12)
plt.legend(title='News Network')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
C:\Users\Michael\AppData\Local\Temp\ipykernel_2388\3103540079.py:8: FutureWarning:



Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect.

C:\Users\Michael\AppData\Local\Temp\ipykernel_2388\3103540079.py:26: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

C:\Users\Michael\AppData\Local\Temp\ipykernel_2388\3103540079.py:26: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

C:\Users\Michael\AppData\Local\Temp\ipykernel_2388\3103540079.py:26: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

6 Analysis of differences in subject coverage

6.1 Comparison of cross-media topic distribution

Code
def preprocess_themes(df):
    themes = df['V2Themes'].dropna().str.split(';')
    return [theme.split(',')[0] for sublist in themes for theme in sublist if theme]

media_themes = {}
for media in ['abc', 'msnbc', 'fox']:
    media_df = combined_df[combined_df['media'] == media]
    themes = preprocess_themes(media_df)
    media_themes[media] = Counter(themes).most_common(20)


theme_comparison = []
for media, themes in media_themes.items():
    for theme, count in themes:
        theme_comparison.append({
            'media': media,
            'theme': theme_name_mapping.get(theme, theme),
            'count': count
        })
        
theme_comparison_df = pd.DataFrame(theme_comparison)

common_themes = set()
for media in ['abc', 'msnbc', 'fox']:
    themes = set([t[0] for t in media_themes[media]])
    if not common_themes:
        common_themes = themes
    else:
        common_themes &= themes

print(f"Top {len(common_themes)} topics focused by 3 medias:")
for theme in common_themes:
    print(f"- {theme_name_mapping.get(theme, theme)}")
Top 11 topics focused by 3 medias:
- USPEC_POLICY1
- GENERAL_HEALTH
- Government
- Presidents
- TRIAL
- Conflict & Fragility
- Safety
- Crisis Reporting
- WB_696_PUBLIC_SECTOR_MANAGEMENT
- General Politics
- Leaders

6.2 Topic comparison visualization

Code
common_theme_counts = theme_comparison_df[theme_comparison_df['theme'].isin(
    [theme_name_mapping.get(t, t) for t in common_themes])]

plt.figure(figsize=(14, 8))
sns.barplot(x='count', y='theme', hue='media', data=common_theme_counts,
            palette=['#1f77b4', '#ff7f0e', '#2ca02c'])
plt.title('Coverage of Common Themes Across Networks', fontsize=16, fontweight='bold')
plt.xlabel('Frequency', fontsize=12)
plt.ylabel('Theme', fontsize=12)
plt.legend(title='News Network')
plt.tight_layout()
plt.show()

unique_themes = {}
for media in ['abc', 'msnbc', 'fox']:
    other_media = set(['abc', 'msnbc', 'fox']) - {media}
    media_themes_set = set([t[0] for t in media_themes[media]])
    for other in other_media:
        media_themes_set -= set([t[0] for t in media_themes[other]])
    unique_themes[media] = media_themes_set

for media, themes in unique_themes.items():
    if themes:
        print(f"\n{media.upper()}Unique topic:")
        for theme in themes:
            print(f"- {theme_name_mapping.get(theme, theme)}")
    else:
        print(f"\n{media.upper()}Has no unique topic")


ABCUnique topic:
- UNGP_FORESTS_RIVERS_OCEANS
- Government Policy

MSNBCUnique topic:
- EPU_POLICY_POLITICAL
- TAX_ETHNICITY_AMERICAN
- EPU_POLICY_WHITE_HOUSE
- TAX_POLITICAL_PARTY_REPUBLICAN
- TAX_POLITICAL_PARTY_REPUBLICANS
- Election
- Democrats
- LEGISLATION

FOXUnique topic:
- TAX_WORLDMAMMALS_FOX

7 Differences in media behavior during election cycles

7.1 Comparison of changes in tone before and after the election

Code
election_shift_results = []
for media in ['abc', 'msnbc', 'fox']:
    media_df = combined_df[combined_df['media'] == media]
    for label, date in election_dates.items():
        pre = media_df[(media_df['parsed_date'] >= date - pd.DateOffset(months=3)) & 
                      (media_df['parsed_date'] < date)]
        post = media_df[(media_df['parsed_date'] >= date) & 
                       (media_df['parsed_date'] < date + pd.DateOffset(months=3))]
        
        election_shift_results.append({
            'media': media,
            'election': label,
            'pre_avg_tone': pre['tone'].mean(),
            'post_avg_tone': post['tone'].mean(),
            'tone_shift': post['tone'].mean() - pre['tone'].mean(),
            'pre_articles': len(pre),
            'post_articles': len(post)
        })

election_shift_df = pd.DataFrame(election_shift_results)

plt.figure(figsize=(14, 8))
sns.barplot(x='election', y='tone_shift', hue='media', data=election_shift_df,
            palette=['#1f77b4', '#ff7f0e', '#2ca02c'])
plt.axhline(0, color='black', linewidth=0.5)
plt.title('Tone Shift Before/After Elections by News Network', fontsize=16, fontweight='bold')
plt.xlabel('Election', fontsize=12)
plt.ylabel('Tone Shift (Post - Pre)', fontsize=12)
plt.legend(title='News Network')
plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()

7.2 Differences in topic preferences in election coverage

Code
election_theme_results = []
for media in ['abc', 'msnbc', 'fox']:
    media_df = combined_df[combined_df['media'] == media]
    for label, date in election_dates.items():
        period_df = media_df[(media_df['parsed_date'] >= date - pd.DateOffset(months=3)) & 
                           (media_df['parsed_date'] <= date + pd.DateOffset(months=3))]
        themes = preprocess_themes(period_df)
        for theme, count in Counter(themes).most_common(10):
            election_theme_results.append({
                'media': media,
                'election': label,
                'theme': theme_name_mapping.get(theme, theme),
                'count': count
            })

election_theme_df = pd.DataFrame(election_theme_results)

election_2020 = election_theme_df[election_theme_df['election'] == '2020 Presidential']

plt.figure(figsize=(14, 10))
sns.barplot(x='count', y='theme', hue='media', data=election_2020,
            palette=['#1f77b4', '#ff7f0e', '#2ca02c'])
plt.title('Theme Coverage During 2020 Election by Network', fontsize=16, fontweight='bold')
plt.xlabel('Frequency', fontsize=12)
plt.ylabel('Theme', fontsize=12)
plt.legend(title='News Network')
plt.tight_layout()
plt.show()

8 Advanced EDA Analysis

8.1 Time series decomposition analysis

Code
from statsmodels.tsa.seasonal import seasonal_decompose

for media in ['abc', 'msnbc', 'fox']:
    media_tone = combined_df[combined_df['media'] == media].set_index('parsed_date')['tone']
    monthly_tone = media_tone.resample('M').mean().dropna()
    
    # seasonal
    decomposition = seasonal_decompose(monthly_tone, model='additive', period=12)  # annual seasonal
    
    plt.figure(figsize=(14, 10))
    decomposition.plot()
    plt.suptitle(f'Time Series Decomposition of {media.upper()} Tone', y=1.02, fontsize=16, fontweight='bold')
    plt.tight_layout()
    plt.show()
C:\Users\Michael\AppData\Local\Temp\ipykernel_2388\3352210059.py:5: FutureWarning:

'M' is deprecated and will be removed in a future version, please use 'ME' instead.
<Figure size 1344x960 with 0 Axes>

C:\Users\Michael\AppData\Local\Temp\ipykernel_2388\3352210059.py:5: FutureWarning:

'M' is deprecated and will be removed in a future version, please use 'ME' instead.
<Figure size 1344x960 with 0 Axes>

C:\Users\Michael\AppData\Local\Temp\ipykernel_2388\3352210059.py:5: FutureWarning:

'M' is deprecated and will be removed in a future version, please use 'ME' instead.
<Figure size 1344x960 with 0 Axes>

9 Network analysis visualization

9.1 Topic co-occurrence network

Code
import networkx as nx
from itertools import combinations


def build_cooccurrence_network(df, top_n_themes=30):

    themes = preprocess_themes(df)
    top_themes = [t for t, _ in Counter(themes).most_common(top_n_themes)]
    
    cooccur = {}
    for themes_list in df['V2Themes'].dropna().str.split(';'):
        themes_in_article = [t.split(',')[0] for t in themes_list if t]
        themes_in_article = [t for t in themes_in_article if t in top_themes]
        for pair in combinations(set(themes_in_article), 2):
            sorted_pair = tuple(sorted(pair))
            cooccur[sorted_pair] = cooccur.get(sorted_pair, 0) + 1
    
    G = nx.Graph()
    for (t1, t2), weight in cooccur.items():
        G.add_edge(theme_name_mapping.get(t1, t1), 
                  theme_name_mapping.get(t2, t2), 
                  weight=weight)
    return G

media_graphs = {}
for media in ['abc', 'msnbc', 'fox']:
    media_df = combined_df[combined_df['media'] == media]
    media_graphs[media] = build_cooccurrence_network(media_df)

plt.figure(figsize=(14, 12))
G = media_graphs['abc']
pos = nx.spring_layout(G, k=0.3, iterations=50)
weights = [G[u][v]['weight']/10 for u,v in G.edges()]
nx.draw_networkx(G, pos, with_labels=True, node_size=800, 
                node_color='skyblue', font_size=10, 
                width=weights, edge_color='gray')
plt.title('ABC News Theme Co-occurrence Network', fontsize=16, fontweight='bold')
plt.axis('off')
plt.tight_layout()
plt.show()

10 Conclusion

10.1 Key Findings

Our analysis of ABC News coverage across five election cycles reveals several significant patterns:

  1. Tone Shifts: All five elections showed a positive tone shift in the post-election period compared to pre-election coverage.

  2. Thematic Evolution: Election coverage transitions from campaign-focused themes before elections to governance and policy themes afterward.

  3. Consistent Themes: Presidential leadership, immigration, and general government operations persist as dominant themes across all periods.

  4. Temporal Patterns: Media tone shows clear cyclical patterns aligned with election cycles, suggesting electoral politics significantly influences news sentiment.

10.2 Methodological Notes

  • GDELT’s tone scores range from -10 (extremely negative) to +10 (extremely positive)
  • Most news content clusters between -5 and +1, with ABC News averaging around -2.7
  • Theme extraction uses GDELT’s thematic coding system, mapped to reader-friendly names
  • Statistical significance was assessed using two-sample t-tests with unequal variances

10.3 Future Research Directions

This analysis could be extended by:

  • Comparing ABC News with other media outlets
  • Examining coverage of specific politicians or policies across electoral periods
  • Analyzing article-level data for more granular insights
  • Incorporating textual analysis techniques to explore narrative framing